Add weight tying support for Llama3 by dean-mccoppin · Pull Request #2580 · pytorch/torchtitan

dean-mccoppin · 2026-03-15T18:05:27Z

Implements enable_weight_tying for Llama3, sharing tok_embeddings.weight with output.weight. It mirrors the Qwen3 implementation from #1590 (thanks!)

Changes cover model.py (config field, tying in init/init_weights, PP guard), parallelize.py (grouped FSDP unit for tied params), state_dict_adapter.py (skip/reconstruct output.weight for HF conversion), and a new unit test file

Closes #1524

tianyu-l

IIUC the existing model registry doesn't have llama3.2 1B / 3B models, which are the only variants which have weight-tying enabled. Please add those models to llama3/__init__.py. You can refer to the exact config in earlier attempt #1376

tests/unit_tests/test_weight_tying.py

Ties tok_embeddings.weight to output.weight via enable_weight_tying config flag. Follows the same pattern as Qwen3 (pytorch#1590). Closes pytorch#1524.

Llama 3.2 1B and 3B are the only Llama variants with weight tying, so they belong in the registry. Without them the feature has no real entry point. Also dropped the try/except guard in test_weight_tying.py, which was inconsistent with every other unit test here and silently skips on broken imports.

tianyu-l

sgtm

tianyu-l · 2026-03-22T01:54:50Z

please fix tests

Implements enable_weight_tying for Llama3, sharing tok_embeddings.weight with output.weight. It mirrors the Qwen3 implementation from #1590 (thanks!) Changes cover model.py (config field, tying in __init__/init_weights, PP guard), parallelize.py (grouped FSDP unit for tied params), state_dict_adapter.py (skip/reconstruct output.weight for HF conversion), and a new unit test file Closes #1524

dean-mccoppin requested review from fegin, tianyu-l, wconstab and wwwjn as code owners March 15, 2026 18:05

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 15, 2026

tianyu-l requested changes Mar 15, 2026

View reviewed changes

tests/unit_tests/test_weight_tying.py Outdated Show resolved Hide resolved

dean-mccoppin added 2 commits March 20, 2026 16:05

Add weight tying support for Llama3

365773c

Ties tok_embeddings.weight to output.weight via enable_weight_tying config flag. Follows the same pattern as Qwen3 (pytorch#1590). Closes pytorch#1524.

dean-mccoppin force-pushed the feat/llama3-weight-tying branch from 8d7f787 to ceda986 Compare March 20, 2026 20:06

tianyu-l approved these changes Mar 22, 2026

View reviewed changes

tianyu-l mentioned this pull request Mar 22, 2026

[Module] Refactor init_weights to config-based param_init system #2633

Open

Fix weight tying PP guard tests by setting tensor_parallel_degree

31bd843

tianyu-l merged commit 8953d2e into pytorch:main Mar 24, 2026
10 of 11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add weight tying support for Llama3#2580

Add weight tying support for Llama3#2580
tianyu-l merged 3 commits intopytorch:mainfrom
dean-mccoppin:feat/llama3-weight-tying

dean-mccoppin commented Mar 15, 2026

Uh oh!

tianyu-l left a comment

Uh oh!

Uh oh!

tianyu-l left a comment

Uh oh!

tianyu-l commented Mar 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dean-mccoppin commented Mar 15, 2026

Uh oh!

tianyu-l left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tianyu-l left a comment

Choose a reason for hiding this comment

Uh oh!

tianyu-l commented Mar 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants